{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "6cf76092",
   "metadata": {},
   "source": [
    "# MERA\n",
    "MERA (Multimodal Evaluation for Russian-language Architectures) is a new open benchmark for the Russian language for evaluating fundamental models.\n",
    "\n",
    "The MERA benchmark includes 21 text tasks (17 base tasks + 4 diagnostic tasks). See the [task-table](https://mera.a-ai.ru/tasks) for a complete list."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc8d0425",
   "metadata": {},
   "source": [
    "## Install"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "178da903",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install datasets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "faa3da64",
   "metadata": {},
   "outputs": [],
   "source": [
    "!git clone https://github.com/ai-forever/MERA\n",
    "%cd MERA"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "50cf7af2",
   "metadata": {},
   "source": [
    "### DATASETS"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "5accaeb0",
   "metadata": {},
   "outputs": [],
   "source": [
    "from datasets import load_dataset \n",
    "# dataset names in low-case\n",
    "task_name = [\"simplear\", \n",
    "             \"rwsd\", \n",
    "             \"rumultiar\", \n",
    "             \"rumodar\", \n",
    "             \"rutie\", \n",
    "             \"rummlu\", \n",
    "             \"ruhumaneval\", \n",
    "             \"ruhatespeech\", \n",
    "             \"rcb\", \n",
    "             \"lcs\", \n",
    "             \"bps\", \n",
    "             \"rudetox\", \n",
    "             \"ruethics\", \n",
    "             \"ruhhh\", \n",
    "             \"use\", \n",
    "             \"parus\", \n",
    "             \"mathlogicqa\", \n",
    "             \"ruopenbookqa\", \n",
    "             \"ruworldtree\", \n",
    "             \"multiq\", \n",
    "             \"chegeka\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "88a45569",
   "metadata": {},
   "source": [
    "### The general structure of set\n",
    "\n",
    "More detailed description can be found in the dataset card [ai-forever/MERA](https://huggingface.co/datasets/ai-forever/MERA).\n",
    "\n",
    "`instruction` — an instructional prompt specified for the current example;\n",
    "\n",
    "`inputs` — input data for one example: must be inserted into the prompt (query);\n",
    "\n",
    "`outputs` — target value, containing the answer;\n",
    "\n",
    "`meta` — additional information field:\n",
    "\n",
    "* `id` — an ordered number of the example from the dataset;\n",
    "*    ..."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4519ae81",
   "metadata": {},
   "source": [
    "## LOAD DATASET from [huggingface](https://huggingface.co/datasets/ai-forever/MERA)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f2057016",
   "metadata": {},
   "outputs": [],
   "source": [
    "### in case of errors run command and restart kernel\n",
    "# !rm -r ~/.cache/ "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "cc281523",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "i = 5\n",
    "DATASET_PATH = \"ai-forever/MERA\"\n",
    "DATASET_NAME = f\"{task_name[i]}\"\n",
    "df = load_dataset(DATASET_PATH, \n",
    "                  DATASET_NAME, download_mode=\"force_redownload\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "8d045e53",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "DatasetDict({\n",
       "    train: Dataset({\n",
       "        features: ['instruction', 'inputs', 'outputs', 'meta'],\n",
       "        num_rows: 10033\n",
       "    })\n",
       "    test: Dataset({\n",
       "        features: ['instruction', 'inputs', 'outputs', 'meta'],\n",
       "        num_rows: 961\n",
       "    })\n",
       "})"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "885d68da",
   "metadata": {},
   "source": [
    "### TRAIN set with answers for n-shot "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "35bd5d13",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>instruction</th>\n",
       "      <th>inputs</th>\n",
       "      <th>outputs</th>\n",
       "      <th>meta</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Задание содержит вопрос по теме {subject} и 4 ...</td>\n",
       "      <td>{'text': 'В какой из этих двух ситуаций действ...</td>\n",
       "      <td>B</td>\n",
       "      <td>{'domain': 'moral_scenarios', 'id': 0}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Задание содержит вопрос по теме {subject} и 4 ...</td>\n",
       "      <td>{'text': 'В какой из этих двух ситуаций действ...</td>\n",
       "      <td>C</td>\n",
       "      <td>{'domain': 'moral_scenarios', 'id': 1}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Ниже приведен вопрос на определенную профессио...</td>\n",
       "      <td>{'text': 'В какой из этих двух ситуаций действ...</td>\n",
       "      <td>C</td>\n",
       "      <td>{'domain': 'moral_scenarios', 'id': 2}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Найдите правильный ответ на вопрос по теме {su...</td>\n",
       "      <td>{'text': 'В какой из этих двух ситуаций действ...</td>\n",
       "      <td>A</td>\n",
       "      <td>{'domain': 'moral_scenarios', 'id': 3}</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                         instruction  \\\n",
       "0  Задание содержит вопрос по теме {subject} и 4 ...   \n",
       "1  Задание содержит вопрос по теме {subject} и 4 ...   \n",
       "2  Ниже приведен вопрос на определенную профессио...   \n",
       "3  Найдите правильный ответ на вопрос по теме {su...   \n",
       "\n",
       "                                              inputs outputs  \\\n",
       "0  {'text': 'В какой из этих двух ситуаций действ...       B   \n",
       "1  {'text': 'В какой из этих двух ситуаций действ...       C   \n",
       "2  {'text': 'В какой из этих двух ситуаций действ...       C   \n",
       "3  {'text': 'В какой из этих двух ситуаций действ...       A   \n",
       "\n",
       "                                     meta  \n",
       "0  {'domain': 'moral_scenarios', 'id': 0}  \n",
       "1  {'domain': 'moral_scenarios', 'id': 1}  \n",
       "2  {'domain': 'moral_scenarios', 'id': 2}  \n",
       "3  {'domain': 'moral_scenarios', 'id': 3}  "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['train'].to_pandas().head(4)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18f5b974",
   "metadata": {},
   "source": [
    "### TEST set for evaluation (with LM-Harness code) \n",
    "\n",
    "There are no answers except ethics diagnostic sets: ruDetox, ruEthics, ruHHH, ruHateSpeech."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "992bf1dc",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>instruction</th>\n",
       "      <th>inputs</th>\n",
       "      <th>outputs</th>\n",
       "      <th>meta</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Задание содержит вопрос по теме {subject} и 4 ...</td>\n",
       "      <td>{'text': 'Цифровой записью числа восемь миллио...</td>\n",
       "      <td></td>\n",
       "      <td>{'domain': 'elementary_mathematics', 'id': 0}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Задание содержит вопрос по теме {subject} и 4 ...</td>\n",
       "      <td>{'text': 'Найди скорость машины, которая проех...</td>\n",
       "      <td></td>\n",
       "      <td>{'domain': 'elementary_mathematics', 'id': 1}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Ниже приведен вопрос на определенную профессио...</td>\n",
       "      <td>{'text': 'Саше дали 5 яблок, потом он забрал е...</td>\n",
       "      <td></td>\n",
       "      <td>{'domain': 'elementary_mathematics', 'id': 2}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Найдите правильный ответ на вопрос по теме {su...</td>\n",
       "      <td>{'text': 'В фирме работает 1500 человек, из ни...</td>\n",
       "      <td></td>\n",
       "      <td>{'domain': 'elementary_mathematics', 'id': 3}</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                         instruction  \\\n",
       "0  Задание содержит вопрос по теме {subject} и 4 ...   \n",
       "1  Задание содержит вопрос по теме {subject} и 4 ...   \n",
       "2  Ниже приведен вопрос на определенную профессио...   \n",
       "3  Найдите правильный ответ на вопрос по теме {su...   \n",
       "\n",
       "                                              inputs outputs  \\\n",
       "0  {'text': 'Цифровой записью числа восемь миллио...           \n",
       "1  {'text': 'Найди скорость машины, которая проех...           \n",
       "2  {'text': 'Саше дали 5 яблок, потом он забрал е...           \n",
       "3  {'text': 'В фирме работает 1500 человек, из ни...           \n",
       "\n",
       "                                            meta  \n",
       "0  {'domain': 'elementary_mathematics', 'id': 0}  \n",
       "1  {'domain': 'elementary_mathematics', 'id': 1}  \n",
       "2  {'domain': 'elementary_mathematics', 'id': 2}  \n",
       "3  {'domain': 'elementary_mathematics', 'id': 3}  "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['test'].to_pandas().head(4)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed7c1c5a",
   "metadata": {},
   "source": [
    "### obtaining query for few-shot evaluation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "aeaf67df",
   "metadata": {},
   "outputs": [],
   "source": [
    "doc = df['train'].to_pandas().loc[300]\n",
    "prompt = doc[\"instruction\"].format(**doc[\"inputs\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "de967612",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'professional_medicine'"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# rummlu evaluation by domain:\n",
    "domain = doc[\"meta\"][\"domain\"]\n",
    "domain"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "c0960547",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Задание содержит вопрос по теме Медицина и 4 варианта ответа A, B, C, D, из которых только один правильный.\n",
      "4-летнего мальчика доставили к врачу из-за температуры до 39,4°C (102,9°F) в течение 8 дней. Обследование показывает переднюю шейную лимфаденопатию, двусторонний неэкссудативный конъюнктивит, земляничный язык, эритематозную сыпь на туловище и отек кистей и стоп. Какая из перечисленных ниже фармакотерапий является наиболее подходящей для предотвращения осложнений этого заболевания?\n",
      "A Внутривенное введение иммуноглобулина\n",
      "B Пенициллин для внутривенного введения\n",
      "C Внутривенное введение преднизолона\n",
      "D Пероральный изониазид\n",
      "Запишите букву правильного ответа\n",
      "Ответ:\n"
     ]
    }
   ],
   "source": [
    "# model query \n",
    "print(prompt)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "09ac1832",
   "metadata": {},
   "source": [
    "## LOAD DATASET from datasets local folder\n",
    "You can load data zip with all dataset files (task.json files) from [MERA](https://mera.a-ai.ru/tasks). Press \"Скачать все задачи\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "4774639a",
   "metadata": {},
   "outputs": [],
   "source": [
    "path_to_datasets = \"MERA_datasets/\"\n",
    "import os \n",
    "task_name = os.listdir(path_to_datasets)\n",
    "i = 5\n",
    "DATASET_PATH = f\"{path_to_datasets}{task_name[i]}\"\n",
    "DATASET_NAME = f\"{task_name[i]}\"\n",
    "df = load_dataset(\n",
    "    path=DATASET_PATH,\n",
    "    name=DATASET_NAME, download_mode=\"force_redownload\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "7003a0fa",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "DatasetDict({\n",
       "    train: Dataset({\n",
       "        features: ['instruction', 'inputs', 'outputs', 'meta'],\n",
       "        num_rows: 10033\n",
       "    })\n",
       "    test: Dataset({\n",
       "        features: ['instruction', 'inputs', 'outputs', 'meta'],\n",
       "        num_rows: 961\n",
       "    })\n",
       "})"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
